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ABSTRACT 


Over the last two years, most scientists have been researching the solution to 
the pandemic coronavirus disease 2019 (COVID-19). So, the effective 
inspection and the rapid diagnosis of COVID-19 provide a mitigation ability 
to the burden on healthcare systems. These research works focus on 
detecting and knowing the history of infection in terms of time and 
developed symptoms. In infections detection, artificial intelligence (AI) 
technologies increase the accuracy and efficiency of the adopted detection 
methods. These methods will aid the medical staff in classifying patients, 
essentially when there is a healthcare resources shortage. This paper 
proposed machine learning-based models for detecting the time of 
COVID-19 infection in weeks using the laboratory factors of detected 
antibodies immunoglobulins G and immunoglobulins M (IgG-IgM). This 
test is common and helpful in diagnosing the suspected patients who held a 
negative result for the reverse transcription-polymerase chain reaction 
(RT-PCR) test. The proposed models consider two machine learning models 


RT-PCR adopting root mean square error (RMSE) and mean absolute error (MAE) 
factors. The results show acceptable efficiency of performance that ranges 
from 80% to 100% for pointing the patient in any week of infection, to 
reduce the likelihood of transmitting the infection from patients who have 
developed symptoms but with false-negative RT-PCR test. 
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1. INTRODUCTION 

The novel coronavirus disease 2019 (COVID-19) pandemic caused by the severe acute respiratory 
syndrome coronavirus 2 (SARS-CoV-2), which was firstly outbreak in China at the end of 2019 and from 
there it was spread to the rest of the world, continues until now to pose a critical and urgent threat to global 
health. As of August 19, 2021, globally, there have been 209, 201, and 939 confirmed cases of COVID-19, 
including 4,390,467 deaths, reported to world health organization (WHO) [1]. The patient’s symptoms, 
some laboratory tests, and chest computerised tomography (CT) scans have the main parameters to indicate 
the diagnosis of COVID-19. Furthermore, the real-time reverse transcription-polymerase chain reaction 
(RT-PCR) assays have also a vital role in the diagnostic process by detecting viral ribonucleic acid (RNA) 
from throat swab or nasal swab specimens [2]. The main pro of this assay is that there is no need for a live 
virus in the specimens. But for several reasons, including the time of infection, the quality of collected 
samples, the quality of the testing machine, the need for specialized staff and reliable laboratories, which led 
provide a limitation in the assay’s role for outbreak containment that in turn increased the rate of 
false-negative results for the confirmed patient [3]. 
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Due to the similarities in the epidemiological features and clinical genetics between the SARS-CoV-2 
and the middle east respiratory syndrome (MERS), the antibodies generation process might be the same [4]. 
That means the detection of immunoglobulins G and immunoglobulins M (IgG and IgM) antibodies could 
provide information on the time course of virus infection [5]. From the observed serological responses for the 
IgM and IgG in COVID-19 patients so the guideline of diagnosis and treatment for COVID-19, which are 
issued by the Chinese National Health Commission, has been recommended to exploit this rapid test 
to confirm or exclusion of SARS-CoV-2 infection in suspected patients [2]. 

Throughout time, different diseases have attacked people, and some are risky enough to cause death. 
The pandemic COVID-19 affects the world in different fields, such as trade, and health. Recently, life in the 
world is almost stopped, including work, and travelling. Therefore, there are urgent requirements for solving 
this problem in different ways. One of the most important is diagnosing the infection with COVID-19 for 
patients that some do not appear any symptoms. Numerous medical tests and the symptoms can lead to the 
diagnosis of the infected cases, yet these symptoms and test results can be confused. Thus, the digital and 
information systems are necessary to solve the prediction problem in the early stages for producing suitable 
medical treatments [6], [7]. 

With the two sides of hardware and software, digital systems can provide a beneficial service to the 
medical field in predicting and diagnosing infected people with COVID-19. This is done in two ways: 
information systems and digital equipment. First, the information system helps the hospital doctors collect 
the information that eases the understanding of the behavior of this disease. Moreover, this information 
provides platforms for detecting the COVID-19 and other diseases efficiently based on the feeding medical 
information and test results [8]. This is to reduce the pressure on doctors and increase diagnosing accuracy 
using artificial intelligence (AI)-based technologies [9]. 

AI technologies increase the accuracy in a notable way by copying human behavior to detect 
a specific type of disease, including COVID-19. Different methods of AI can be used in this field, such as 
deep learning and machine learning, due to their efficiency in copying the human behavior, in which the 
diagnosing is accursed [10], [11]. The machine learning technique is based on building more than one model 
that can be learned using a dataset related to the subject [12], [13]. These models provide the system with 
classified and arranged results. These results help in getting the soft decision with more accuracy than the 
classical methods that adopt the hard decision [14]-[17]. 

A lot of researchers work on finding an efficient digital diagnosing system for detecting the infected 
cases of COVID-19. Padoan et al. [18] presented a comprehensive study on the ability of MAGLUMI 2000 
Plus CLIA assay in measuring the antibodies of IgG and IgM for COVID-19 patients. Different tests were 
performed on this model, and the gotten results showed the efficiency in measuring the IgG and IgM over 
time intervals for a number of patients. In [19], the B-LiFe mobile laboratory was adopted in testing the body 
response in terms of antibodies IgG and IgM for different patients over a one-month time period. 
The sensitivity of the used kit in discovering the occurrence of these antibodies proved its accepted 
performance and the high accuracy of COVID-19 infections. On the other hand, a complete study on 
appearing the COVID-19 antibodies over the time in days was introduced in [20]. All ratios and the 
probabilities of measuring the total antibodies of IgG and IgM were listed, where the blood samples have 
been taken from day 1 to day 3 of the first infection. This study showed that patients with developed 
antibodies could fight the COVID virus, while the dead patients suffer from low constant ratios. In [2], 
the antibodies of COVID-19 were measured and presented in a report that explained the behavior of bodies 
with IgG and IgM over the time from the first appearance of symptoms. The IgG appeared after the first 
week of infection and developed for a long time. Moreover, the IgM appeared to reach the top in the first two 
weeks and started reducing after a while. 

On the other hand, in [21], an intelligent IgG-IgM detection system was proposed with high 
accuracy. This system worked in two steps. The first one was responsible for detecting the increase of the 
mentioned antibodies, while the second one was responsible for recognizing the diseases type and the level as 
well. Moreover, Mendels et al. [22] proposed a mobile application that used machine learning technology in 
detecting the COVID-19 patients based on the measured rapid test of IgG and IgM. The developed 
application increased the detection accuracy and the precise assurance of positive and negative results. 

In the same way, an intelligent COVID-19 detection system was proposed in [23] using the swarm 
intelligent algorithm, evolutionary algorithms, and cost minimization manual selection. This is to increase 
the validity of the adopted results and increase the testing speed. A literature survey was prepared in [24], [25] 
for the research work in the field of using AI technologies in diagnosing COVID-19 patients. This study 
tackled the prospective and concepts that adopted in producing the detection system intelligently. In [26], 
the intelligent system was adopted in detecting the patients with COVID-19 depending on the levels of 
antigens serum. This was another way in detecting using blood samples. 

Another study on the presented papers was introduced in [27] to focus on the important issues that 
face the detection of COVID-19 infection cases. Thus, it considered these papers that solved the issues using 
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intelligent systems that enhance the detection accuracy and time required. In [28], two intelligent models 
were proposed to detect COVID-19 infections. These models used deep learning and machine learning in 
diagnosing the infections based on CT-scan images. The proposed system worked in two steps system that 
used the two models in the detection depending on the types of images. Moreover, Li et al. [29] adopted the 
deep learning model in detecting the infected lung images of infected COVID-19 patients. They considered 
images were the pulmonary CT scan dataset for high-resolution purposes. Deep literature on the use of 
intelligent algorithms in detecting infected cases for low living level countries was presented in [30]. 
The study focused on the type of AI methods and the images used for detecting. In addition, Harmon et al. [31] 
used datasets from different countries to support the training data in modelling and learning the proposed 
intelligent systems. This could enlarge the angles of detecting the COVID-19 patients. 

In this paper, COVID-19 cases are diagnosed in terms of infection time in a week using the 
proposed machine learning-based detection system based on the IgG and IgM antibodies test results. These 
results are fed to the machine learning models to fix the right week time of the first COVID-19 infection. 
The machine learning models are learned using a dataset of real cases in a hospital with IgG and IgM results. 
The implementation results of the proposed detection system show a satisfying efficiency in diagnosing the 
time of COVID-19 infection in a week. 


2. PROPOSED SYSTEM 

In this paper, the detection of the COVID-19 infection time in weeks is considered to be sure that 
the patient with negative RT-PCR tests can be confirmed. And that’s in turn, decreases the possibility of 
spreading the infection by people who carry the virus and whose infection has not been confirmed by the 
RT-PCR tests. This section illustrates the proposed system under four sub-sections as follows. 


2.1. The proposed system structures 
Figure 1 shows the block diagram of the proposed system. It includes the following: 

— The dataset is an important factor in the success of the proposed detection system that adopts machine 
learning technology. The dataset is divided into two parts: training data of 80% and testing data of 20%. 

— These data are entered into the designed algorithm of machine learning that includes algorithm and 
evaluation as a correction loop. 

— The results are fed to the system model that receives the real-time production data from samples of 
patients needed for the test to produce the week of infection of COVID-19 for patients. 





Figure 1. The proposed system structure block diagram 
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2.2. Dataset 

This paper adopts the basis for determining the data gathering process type. Due to the nature of the 
ideology of the proposed system that adopts the finding of the infection week, a collected dataset from health 
centres of [32] is considered. This dataset contains 425 samples taken from 285 patient, represented 
as a comma-separated values (CSV) file to be read and used by the Python library, Panda tool. The Baseline 
characteristics of the dataset’s patients are illustrated in Table 1. The collected dataset needs pre-processing 
steps for preparing it as an adopted dataset [33]. The most common issues include large values, missing data, 
noisy data, or unorganized text data. 

One of the essential steps in designing a machine learning system is the pre-processing of the 
gathered data due to its effective role in model accuracy [34]. In simple words, data pre-processing 
is a process of cleaning and making the raw data more suitable to be used in model training. Due to the 
multiple types of data that can be included in the dataset, the conversion process is one of the basic 
pre-process techniques. In this process, and as known that the machine learning model can handle only the 
numeric type of data, the categorical data of the used dataset in the proposed system are converted to numeric 
features. As the next process in this step, the missing values of the used dataset are ignored instead of filling 
them by the manual procedure as common to keep the real state of the collected data. A total of 363 valuable 
samples are presented to the next stage of researching the best-fitted model for the type of data. 

The main goal when designing the machine learning model is to select and find the best model that 
fits with the used dataset [34]. Between the supervised and unsupervised learning types, the finding of the 
data type of the used dataset was labelled. Therefore, the perfect usage is founded with the supervised 
learning model. As well, supervised learning has two categories, which are classification and regression. Due 
to the continuous nature of the selected target variable, the regression algorithm and decision tree regressor 
methods [35] are adopted. 


Table 1. Baseline characteristics of dataset’s patients [32] 


Severe or critical Non-severe or critical 


pasture AON TRB) condition (39) condition (246) 
Age range- years (37-56) (46-77) (36-55) 
Cadei Male: 158 Male: 21 # Male: 137 
Female: 127 Female: 18 Female: 109 
Incubation period-days (5-12) (8-10) (5-12) 
Comorbidities 
Hypertension 41 7 34 
Cardiovascular disease 9 3 6 
Diabetes 22 7 15 
Malignancy 3 1 2 
Chronic kidney disease 2 0 2 
Chronic liver disease 17 5 12 
Hyoxemia 4 2 2 
Tuberculosis 6 1 5 
Signs and symptoms (N of patients) 
Fever 168 24 144 
Fatigue 79 20 59 
Dry cough 157 22 135 
Anorexia 66 18 48 
Myalgia 35 5 30 
Dypnca 19 14 5 
Expectoration 66 14 32 
Pharyngalgia 30 2 28 
Diarrhea 25 4 21 
Nausea 17 4 13 
Dizziness 24 6 18 
Headache 30 4 26 
Vomiting 7 2 5 
Abdominal pain 6 2 4 
Chill 43 7 36 
Nasal congestion 10 2 8 
Rhinorrhea 13 2 11 
Chest stuffiness 42 8 34 
Heart rate range (bpm) (80-97) (84-96) (78-97) 
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2.3. The proposed algorithm 

It is well known that the algorithms can be written in a different style to represent the contains 
understandably. Figure 2 represents the proposed algorithm that manages the proposed COVID-19 infection 
week system as a workflow chart. 

The adopted work step can be summarized as: 

— Collecting the dataset from the considered resources. 

— Checking the validity of the collected dataset. It works in a feedback loop with the collecting to make 
sure that all received data is valid for use in the machine learning models. 

— The obtained data is fed to the designed two machine learning models to be applied. 

— Checking the results of the two models of machine learning. This step has a feedback link to the stage of 
applying the machine learning models for ensuring validity. 

— These results are entered into the checking model that checks the input of-line samples with the trained 
output. The input samples contain the IgG and IgM readings of patients after processing the blood 
samples. 

The output results can tell the week of infection of these samples. 


Dataset Collecting k Dataset validity 


Applying the 
designed ML 
models 


Dataset Pre- 


processing 





Checking the 
results validity 


samples to be | | Results: Week of 


Model Checking 


predicted infection 





Figure 2. The proposed algorithm workflow 


2.4. Machine learning models 

As explained in the past sections, the proposed system adopts two machine learning models for 
computing the week of infection in COVID-19. These models adopt the root mean squared error (RMSE) and 
mean absolute error (MAE) values between the trained and the tested data. The RMSE can be computed 
using [36]: 


N — 
Zizi Xa) 
N 


RMSE = (1) 


Where N is the total number of samples, x; is the trained sample, and X, is the tested one. While the MAE is 
computed as [36]: 


N __y 
MSE = Zsa (2) 


The designed models of machine learning are: 

— Decision Tree Regressor model (with RMSE = 4.3 and MAE = 3.29), 

— Random Forest Regressor model (with RMSE = 4.11 and MAE = 3.21). 

These models are well known, and the RMSE and MAE values for each model are obtained by experiment. 
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3. RESULTS 

The proposed detection system’s performance is tested under the consideration of ten case studies. 
These ten cases are represented by the values of IgG and IgM factors for patients (suspected) to have 
COVID-19 virus. Different software environments are utilized to design the simulator that simulates the real 
case of the designed models. 

The obtained results, that point out the week of infection of the considered patients, are shown 
in Figure 3. These results are the average values of the RMSE and MAE for each case often. From the Figure 3, 
it is shown that the infection week of ten cases with one day after symptoms appear. The days with the cases 
represent the real days of infection, while the patterned bars are the weeks from | to 4. The designed models 
of machine learning decide the week of infection that is shown in the four-week pattern. It 1s concluded that 
most cases are detected with exact the week of infection, except cases 3 and 7. These cases happened on 
days 8 and 6, respectively; thus, they are close to the next and last week. 

For more testing and evaluating the proposed system, Table 2 illustrates the efficiency of a different 
number of cases that are tested using the designed machine learning models. This table shows various 
efficiency ratios, and this has happened for different reasons, including: 

— The day that sample is taken after symptoms appear. 

— The real day of infection can be the trade of cases between two weeks. 

— The sample data is not valid. 

Therefore, the results are varied between 80% to 100%. The efficiency is computed as: 


Ef ficiency = No.of right decision (3) 


Total samples 


The RMSE for the likely week prediction 
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Figure 3. Average of RMSE values for ten cases and the week decision 


Table 2. The designed machine learning models’ efficiency of a different number of cases 
No. of sample Sampling time No. of right decision of infection week Efficiency 


10 1-2 days 8 80% 
30 1 day 29 97% 
50 2 days 47 94% 
100 1 day 99 99% 
150 1 day 140 94% 
200 1 day 193 97% 
210 1 day 204 97% 
220 1 day 220 100% 
300 1 day 298 99% 
400 1 day 396 99% 


4. CONCLUSION 

An infection week of the COVID-19 detection system was proposed based on machine learning 
models. The used training dataset was pre-processed to introduce valid data ready for the designed models. 
The designed machine learning models adopted the decision tree regressor and random forest regressor 
models with the MSER and MAE values (4.3, 4.11) and 3.29, respectively, as a tradeoff threshold for 
decision. The proposed system proved its ability in detecting the right week of infection, as shown in the 
considered results. The obtained results illustrated the proposed system’s efficiency in the range 
of 80% to 100% depending on the day of sampling and the infection day that can be between two weeks. 
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